Skip to content

[python] Add $buckets system table#7989

Merged
JingsongLi merged 1 commit into
apache:masterfrom
JunRuiLee:pypaimon-buckets-system-table
May 28, 2026
Merged

[python] Add $buckets system table#7989
JingsongLi merged 1 commit into
apache:masterfrom
JunRuiLee:pypaimon-buckets-system-table

Conversation

@JunRuiLee
Copy link
Copy Markdown
Contributor

Summary

pypaimon currently implements 8 system tables: $snapshots, $schemas, $options, $manifests, $files, $partitions, $tags, $branches. Compared to the Java side, it still lacks $buckets, $audit_log, $read_optimized, $consumers, $statistics, $aggregation_fields, $file_key_ranges, $table_indexes, etc.

This PR adds $buckets, which is one of the more commonly used system tables for diagnosing data skew. It aggregates manifest entries by (partition, bucket) and exposes per-bucket record_count, file_size, file_count, and last_update_time.

Changes

  • New: buckets_table.pyBucketsTable implementation
  • New: buckets_table_test.py — 5 end-to-end tests (schema validation, empty snapshot, aggregation correctness, sort order, catalog dispatch)
  • Modified: system_table_loader.py — register "buckets"
  • Modified: system_table_loader_test.py — update expected table list

@JunRuiLee JunRuiLee force-pushed the pypaimon-buckets-system-table branch from 33a99df to 79d7768 Compare May 27, 2026 03:12
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the Python $buckets system table. The Python-side implementation and tests look reasonable, but this PR currently reverts the recent SubstringTransform fix from #7987.

In paimon-common/src/main/java/org/apache/paimon/predicate/SubstringTransform.java, the diff changes the null check from the referenced source field index back to column 0:

sourceString = row.isNullAt(0) ? null : row.getString(sourceFieldRef.index());

and it also removes testSubstringRefInputUsesSourceFieldNullability. This reintroduces the bug where SUBSTRING(FieldRef(index > 0), ...) returns null whenever column 0 is null, even if the actual referenced source column is non-null. That is unrelated to $buckets and would regress existing transform behavior.

Please rebase/merge current master and keep #7987's sourceIndex fix and test, or remove the Java SubstringTransform changes from this PR. I ran the new Python tests locally and they passed:

PYTHONPATH=. python -m pytest -q pypaimon/tests/system/buckets_table_test.py pypaimon/tests/system/system_table_loader_test.py
# 10 passed

@JunRuiLee
Copy link
Copy Markdown
Contributor Author

JunRuiLee commented May 27, 2026

@leaves12138 Thanks for catching this! Rebased onto latest master — the SubstringTransform changes are no longer included in this PR.

@JunRuiLee JunRuiLee force-pushed the pypaimon-buckets-system-table branch from 79d7768 to e18d649 Compare May 27, 2026 06:38
@JingsongLi
Copy link
Copy Markdown
Contributor

+1

@JingsongLi JingsongLi merged commit 5b667d8 into apache:master May 28, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants